a is the intercept, which is the value of Y when
.
b is the slope, which is the amount Y changes when X increases by 1.
In straight-line regression, our goal is to develop the best-fitting line for our data. Using least-squares
as a guide, the best-fitting line through a set of data is the one that minimizes the sum of the squares
(SSQ) of the residuals. Residuals are the vertical distances of each point from the fitted line, as shown
in Figure 16-2.
© John Wiley & Sons, Inc.
FIGURE 16-2: On average, a good-fitting line has smaller residuals than a bad-fitting line.
For curves, finding the best-fitting curve is a very complicated mathematical problem. What’s nice
about the straight-line regression is that it’s so simple that you can calculate the least-squares
parameters from explicit formulas. If you’re interested (or if your professor insists that you’re
interested), we present a general outline of how those formulas are derived.
Think of a set of data containing
and
, in which i is an index that identifies each observation in the
set, as described in Chapter 2. From those data, SSQ can be calculated like this:
If you’re good at first-semester calculus, you can find the values of a and b that minimize SSQ by
setting the partial derivatives of SSQ with respect to a and b equal to 0. If you stink at calculus, trust
that this leads to these two simultaneous equations:
where N is the number of observed data points.
These equations can be solved for a and b: